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Abstract. In previous work, we presented a symbolic execution method which 
starts with a concrete model of the program but progressively abstracts away de- 
tails only when these are known to be irrelevant using interpolation. In this paper, 
we extend the technique to handle unbounded loops. The central idea is to pro- 
gressively discover the strongest invariants through a process of loop unrolling. 
The key feature of this technique, called the minimax algorithm, is intelligent 
backtracking which directs the search for the next invariant. We then present an 
analysis of the main differences between our symbolic execution method and 
mainstream techniques mainly based on abstract refinement (CEGAR). Finally, 
we evaluate our technique against available state-of-the-art systems. 

1 Introduction 

CounterExample-Guided Abstraction Refinement (CEGAR, or more briefly, AR) M8I2I21L 
has been a very successful technique for proving safety in large programs. Starting with 
a coarse abstraction of the program {abstraction phase), the abstraction is checked for 
the desired property {verification phase). If no error is found, then the program is safe. 
Otherwise, an abstract counterexample is produced. The counterexample is then ana- 
lyzed to test if it corresponds to a concrete counterexample in the original program. 
If yes, the program is reported as unsafe. Otherwise, a counterexample-driven refine- 
ment is performed to refine the abstract model such that the abstract counterexample is 
excluded {refinement phase), and the process starts again. Several systems have been 
developed during recent years following this approach [1.7 14 2 0191 121311 II . 

In a previous work |fT71 we presented a. dual algorithm to AR, here called Abstraction 
Learning, for loop-free program fragments. Essentially, our technique starts with the 
concrete model of the program. Then, the model is checked for the desired property 
{verification phase) via symbolic execution. If a counterexample is found, then it must 
be a real error and hence, the program is unsafe. Otherwise, the program is safe. In order 
to make the symbolic execution process practical, the technique learns the facts that are 
irrelevant for keeping infeasible paths by computing interpolants {learning phase), and 
then it eliminates those facts from the model {abstraction phase). Unfortunately, this 
work did not provide an automatic treatment of loops while it assumed user-provided 
loop invariants to make symbolic executions finite. 

In this paper, we extend the technique proposed in ifTTl to discover loop invariants. 
The central idea is to progressively discover the strongest invariants through a lazy 
process of loop unrolling. 

For a given loop, path-based loop invariants are computed and used to generalize 
the states at the looping points (program points where the merging of control paths 



construct some cyclical paths). Our computation of invariants is lightweight as they are 
computed by manipulation, using the theorem prover, of explicit constraints. The algo- 
rithm attempts to minimize the loss of information by computing the strongest possible 
invariants. These speculative invariants may be still too coarse to ensure safety. Here 
the algorithm computes interpolants to ensure that error locations are not reachable, re- 
sulting in selective unrolling at points where the path-based invariant can no longer be 
produced due to the strengthening introduced by the interpolants. Similar to AR, this 
procedure is only guaranteed to terminate when loop iterations are bounded. 

A fundamental distinction with AR is that we attempt to always construct the most 
precise abstraction for loops by computing the strongest lightweight loop invariants. 
This feature is vital to detect as many infeasible paths as possible during the symbolic 
execution-based traversal. Our thesis is that this investment often pays off, and even in 
examples where it does not, it is affordable. 

The contributions of this paper can be summarized as follows: 

1. We extend the interpolation-based symbolic execution algorithm in IfTTI to deal 
with unbounded loops by describing a novel lazy loop unrolling algorithm called 
minimax. 

2. We provide an analysis using several academic examples of the major differences 
between our proposed algorithm and mainstream techniques mainly based on ab- 
straction refinement. 

3. Finally, we implement the main ideas of this paper in a system called TRACER, and 
we evaluate it using real programs against BLAST, available state-of-the-art system. 

Related Work. Our work is clearly related to abstraction refinement ( CEGAR) [181212 1 1 1 41 1 311 . 
We dedicate Sec.[3]to exemplify main differences through some academic examples and 
Sec.|6]to compare with BLAST using real programs. 

Recent algorithms such as Synergy/DASH/SMASH 11 1 2131 1 1 1 use test-generation fea- 
tures to enhance the process of verification. The main advantage comes from the use of 
lightweight symbolic execution provided by DART iflOl to mitigate the expensive cost 
of the abstract post-image operator when predicate abstraction is used. An advantage 
of our approach is that it does not suffer from this drawback since ours is symbolic 
execution-based and does not use predicate abstraction. More importantly, these tools 
rely on CEGAR to build the abstract model of the program, and hence, major limitations 
observed in Sec. [3] still hold. Moreover, there is no benefit of using test cases using our 
method unless there is a real counterexample in the program. On the contrary, we can 
construct reasonable scenarios where Synergy and its descendants can have an expo- 
nential slowdown wrt to ours as shown in Sec. [3] 

Our closest related works are in I116J171 . where interpolation was performed on a 
search tree of a CLP goal in pursuit of a target property. (The earlier paper lTT6l fo- 
cussed on a finite domain for an optimization problem.) But these works did not con- 
sider loops. The main conceptual advance of this paper is to address loops, and in doing 
so, allows for the consideration of real-life programs. Furthermore, this paper provides 
a detailed analysis of differences with the state-of-the-art CEGAR method, and finally, 
we present a comprehensive experimental evaluation with BLAST, the most advanced 
CEGAR implementation available to us at this time. 
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Fig. 1. Interpolation and Subsumption of Infeasible Paths 

Very recently, another interpolation-based symbolic execution method has been pro- 
posed, independent from ours, in 0191 , This work can be considered in two parts. In the 
consideration of loop-free program fragments, this work is in fact subsumed by the ear- 
lier works 0161171 . In the consideration of loops, |fl9l presented a naive strategy for 
handling loops based on an iterative deepening process. The central idea is to compute 
interpolants for a fixed depth in the hope they will converge to inductive assertions after 
an expensive fixpoint computation. We quote from lH9l : "the question of how to ob- 
tain convergence in practice for unbounded loops needs further study". Therefore, the 
description of a concrete algorithm from this idealistic one is far from being trivial. Fur- 
thermore, experimental evaluation was provided only in regard to testing, and not for 
the case of verification. In contrast, in this paper we present a directed approach which 
essentially amounts to an intelligent backtracking strategy which takes into account the 
reason for failure at the current stage. 



2 The Basic Idea 

Our basic algorithm performs symbolic execution of the programs while attempting to 
find an execution path that reaches the error() function. If such path cannot be found, 
then it concludes that the program is safe. 

Consider the program in Fig. [Tja)- We depict in Fig [TJb) the naive symbolic ex- 
ecution tree, and in Fig. [TJc) a smaller tree, which still proves the absence of bugs. 
During the traversal of the tree, our algorithm preserves the infeasibility of the paths 
using the well-known concept of interpolation. Let us focus on Fig. [TJc) and con- 
sider, for instance, the path A = (0)-(l)-(3)-(5)-(7) which is detected as infeasible 
(x = Ay < 1 Ay > 1). Applying our infeasibility preservation principle, we keep node 
(7) labeled with false. This produces the interpolant y < 1 at node (5) since this is the 
most general condition that preserves the infeasibility of node (7). Note that here, y < 1 
is entailed by the original state x = Ay < 1 of node (5) and in turn entails y > 1 \= false. 

Now consider another path B = (0}-(l)-(3)-(5)-(6)-(7)-(8) and the node (8) with 
the formula y < lAx = 4Ax>5 which is also infeasible. The node (7) can be in- 
terpolated to x < 5. As before, this would produce the precondition x < 1 at (5). The 



final interpolant for (5) is the conjunction of y < 1 (produced from A) and x < 1 (pro- 
duced from B). In this way, when (5) is visited through the path (0)-(l)-(3)-(4)-(5) the 
state cannot yet be subsumed since the current context y > 1 Ax = 2 does not entail the 
interpolant stored at (5) (y < 1 Ax < 1). After that, the symbolic execution continues 
normally until the the prefix C = (0)-(l)-(2)-(3) is traversed. The formula x = asso- 
ciated to the state at (3) entails the interpolant at x < 1 at (3) and hence, our algorithm 
finishes proving safety without traversing the whole subtree rooted at prefix C. 

Loops. We now explain how our algorithm handles loops using a slightly modified clas- 
sic example from lfT4l shown in Fig.|2ja). Essentially, it automatically infers path-based 
loop invariants using information learned during traversal. The constructed loop invari- 
ant for a given path inside a loop is a conjunction of constraints whose truth values 
remain unchanged after one or more iterations of the loop. Similar to abstraction refine- 
ment, this process may require refinements in the case the abstraction is too coarse to 
prove the safety property. 

In Fig.[2|b) assume the first path explored is (0)-(l)-(2)-(3)-(l') denoting a cyclic 
path from location (1") back to (1). Note that (1') and (1) correspond to the same pro- 
gram point. We use primed versions to distinguish multiple occurrences. Our algorithm 
then examines the constraints at the entry of the loop (i.e., lock==0, new==old+l,f lag==l) 
to discover those whose truth values remain unchanged after the loop (i.e., lock==l, 
new==old, f lag==l). Clearly, the constraints lock==0 and new==old+l are no longer 
satisfied while f lag==l still holds. 

At this point, our algorithm produces an abstraction at the location (1) by making 
the truth values of lock==0 and new==old+l unknown. In this way, the constraints 
at (1') now entails the modified constraints of (1) (flag==l), achieving parent-child 
subsumption. Assume the next explored path is (0)-(l)-(2)-(3)-(4)-(l") (Fig.|2jb)). At 
(1"), the constraints already entail the generalized constraint of (1) (they are invariant), 
and we therefore stop the traversal. 

After the loop is traversed, the remaining constraint at (1) is f lag==l and this is in 
fact a loop invariant discovered by the algorithm. Since we have removed new==old+l 
from (1), the exit path of the loop now becomes feasible as the condition new==old be- 
comes satisfiable. For this reason the traversal reaches (5) with the constraint f lag==l 
propagated from (1) and new==old which is obtained by strongest postcondition prop- 
agation through the loop exit transition. 

Since we keep flag==l in the loop invariantat (1), the algorithm manages to reason 
that the path (0)-(l)-(5)-(6) is infeasible (Fig.|2|b)). One important point here is that 
the algorithm exits the loop with maximal information. This is useful to detect as many 
infeasible paths as possible. An AR algorithm would not detect the infeasibility and 
would visit error() at (8). 

Next, our algorithm visits the nodes (7) and (8) also in Fig.|2{b), which is an error 
location. The path is spurious, and the algorithm discovers using interpolation that one 
of the reason for the reachability of this point is the removal of new==old+l at (1). 
The algorithm decides to lock new==old+l at (1) and restarts the traversal from (1). 
Locking declares that the constraint cannot be removed for generating loop invariant. 
This is our main mechanism to ensure progress. 
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Fig. 2. Loops 

The next traversal after the locking is depicted in Fig.|2jc). Similar to the first traver- 
sal, the path (l)-(2)-(3)-(r) is again re-traversed. At (1'), the constraints do not entail 
the constraints of (1) anymore. Due to locking of new==old+l, we are prevented from 
generating a loop invariant, and hence, subsumption does not hold. As the result, the 
traversal continues, and it is completed without visiting the error program point at (8). 

An essential observation is that due to its directed search for loop invariants, the 
algorithm does not unroll the location at (1") (Fig. [2jc)) since the state is already sub- 
sumed by (1) without the need to force any abstraction. A naive iterative deepening 
algorithm (e.g., lfl9l ) would also unroll that path, and hence, we can construct reason- 
able scenarios in which this leads to an exponential explosion. 



3 Comparison with the State-Of-The-Art 

We now analyze essential differences between our approach and mainstream techniques 
which are mainly based on abstraction refinement (CEGAR). 

Exploration of Infeasible Paths. The core idea of abstraction refinement is to use the 
most general abstraction first, and refine later. This causes the exploration of infeasible 
paths which stresses significantly well-known problems in AR. First, the more predi- 
cates are considered in the abstract model the more costly will be the verification phase. 
Moreover, if predicate abstraction is used (e.g., SLAM and BLAST) expensive abstract 
post-image and quantifier elimination are needed. Finally, the cost of the refinement 
process may be also prohibitive. 

Because of the huge impact of exploring infeasible paths significant research has 
been done recently. A partial solution has been the use of DART in order to provide 
a symbolic execution engine in Synergy-like tools 1121311 II . However, the construc- 
tion of the abstract model is still needed and the above problems persist. Furthermore, 
these tools may perform unnecessary refinements that may create reasonable scenarios 
which lead to an exponential behavior. Consider the program in Fig. 0a). Assume that 
a Synergy-like tool produces the test case (l}-(2}-(3)-(7)-(10), and that the abstract 
model, with no predicates, reaches the error through the path (l)-(2}-(3}-(5}-(6)-(8)- 
(9). Then, it tries now to produce new test cases by negating the first constraint which 
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s=0; l:if (*) { 

if(*) z=0; 2: x=0; 

else z=999; 3: y=0; 1 

// 1 4:else( 

if ( * ) s++ 5: x-complex_f unc ( ) ; 

else s+=2; 6: y=0; ) 

7 : s=x; 



// N 
if(*) s++ 
else s+=2; 



if(s+z>2*N SS z = = 0) 

error ( ) ; /*N*/ 



8 : t=y; 
/*!*/ 

9:if (*) |s++;t++; : 



10:if (*) {s++;t++; ) 
ll:if(t>N SS s>N) error!); 
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Fig. 3. Several Programs 

is not in the common prefix (i.e., x>0) but it is unsatisfiable since x = 0. Therefore, it 
will likely add the predicate x < which is irrelevant for proving safety. Our technique 
will traverse the path through (l)-(2)-(3)-(7)-(8) and produce the interpolant q = 1. 
The rest of paths will entail that interpolant, and hence, the behavior will be linear on 
the size of the program. 

Discovering Loop Invariants. Any symbolic traversal method will have to eventually 
discover loop invariants that are strong enough for the proof process to conclude suc- 
cessfully. In the case of AR, the abstract model is refined from spurious counterexamples 
by discovering which predicates can refute the error path, and in this process, they are 
hoped to be in fact invariant through loops. 

A crucial observation is that the inference of invariant predicates can speedup sig- 
nificantly the convergence of loops J4). We therefore employ invariant discovery by 
searching for the strongest invariants. This principle is also in accordance with our phi- 
losophy to perform concrete symbolic execution in order to maintain exact information 
for loop-free fragments. 

Fig. [3jb) illustrates the benefits of computing strongest loop invariants. AR will 
discover the predicates (n = 0) , (« = 1), . . . , (n = N — 1) and also (y = 0), . . . , (y = N), 
and hence full unrolling of the loop is needed. To understand why our approach avoids 
the full unrolling, the concept of inference of path-based loop invariant constraints is 
essential. Consider the path (l}-(2}-(3}-(4)-(5)-(3 , ). The state at (3') can be specified 
by the constraint y = A n = A n < N A y' = y + 1 A n' = n + 1 on the variables x 1 
and V . Our algorithm will attempt to infer which constraints are individually invariant 
in order to get parent-child subsumption (i.e., "close" the loop). It is straightforward 
to see that y > (by slackening y = to y > 0) is invariant through the loop because 
when (3) is first visited, y = ==> y > holds and after one iteration the constraints 
y>0Ay'=y+l still imply y' > 0. The second essential step is when the exit condition 
is taken (i.e., (l)-(3)-(6)) our technique will attach n > N to all invariant constraints 
(in this case y > 0) by computing strongest postcondition. More importantly, those two 
constraints (y > A n > N) suffice to prove that the error condition y + n < N is false. 
Therefore, we are done with only one iteration through the loop. 

Using Newly Discovered Predicates in Future Traversal. Another fundamental ques- 
tion in AR: after the set of predicates required to exclude the spurious counterexample 



has been discovered, how should those predicates be used in other paths? Consider our 
next program in Fig[3jc). 

A counterexample-guided tool will discover the predicates (s = 0), (s = 1 ),..., (s = 
2 *N). Then, it will either add (z = 0) or (z = 999). Assume that it first adds the predicate 
(z = 0). The key observation is that all the paths that include (z = 999) (location (3)) 
will be traversed considering all the predicates discovered from paths that included 
(z = 0) (location (2)), and hence, the traversal will be exponential. 

Our algorithm will basically perform the same amount of work for the case that 
z = is considered. However, it traverses the paths that include z = 999 without consid- 
eration of the facts learnt from paths that include z = since it only keeps track of the 
concrete state collected so far (i.e., s = QAz = 999). Then, after the path (l)-(3)-(4)-. . . - 
(6)- (8) is traversed we can discover in a straightforward manner that z = 999 suffices to 
refute the error state and hence, the rest of the paths will be subsumed. Notice that AR 
will also discover the predicate (z = 999) after the counterexample is found. The essen- 
tial difference, for this class of programs, is that the predicates discovered previously 
((s = 0), (s = 1), . . . , (s = 2 *N) ) are used, and hence, the traversal will be significantly 
affected by them. 

Running an Abstract State Hampers Subsumption. The next example illustrates an- 
other potential weakness of AR that is not present in our approach. Even if locality is 
well exploited, the likelihood of subsuming the currently traversed state may be di- 
minished because the state, being abstract, is too coarse. Consider now the program in 
Fig-Ed). Assume complex_func returns always 0. 

In principle, a counterexample-guided tool will behave very similarly as in the pro- 
gram in Fig. [2c). Assume that the prefix path (l)-(2)-(3) is taken. It will then discover 
the predicates (x = 0), (s = 0), (s = 1), . . . , (s=N), (y = 0), (r = 0), (t = 1), . . . , (t=N). 
Again, those predicates are likely to be used during the exploration of the else-branch 
((4)-(5)-(6)). However, an essential difference with respect to program in Fig. 0c) is 
that although the discovered predicate (x = 0) is taken into consideration, the abstract 
state cannot be covered since it is too coarse assuming it does not consider lazily the 
value returned by complex_func, and hence, it does not entail the predicate (x = 0). In 
contrast, since our method does perform a systematic propagation of the program state 
the value returned by complex_f unc will be captured and we will be able to entail the 
interpolant x = 0. The main consequence is that the state now will be subsumed. 

Unnecessary Detection of Infeasible Paths. So far we have illustrated scenarios where 
our approach behaves better than AR. The advantage exploited in the preceding exam- 
ples is the preservation of infeasible paths while abstracting loops using the strongest 
lightweight loop invariants. Unfortunately, this characteristic might be an important 
downside if the program can be proved safe even traversing infeasible paths since all 
the work of generating interpolants for preserving infeasible paths would be wasteful. 

We claim that eager detection of infeasible paths even if they are not relevant to 
the safety property is not limiting in practice. The reason is that many of the infeasible 
paths in real programs must be considered anyway to block the error paths, and hence, 
counterexample-guided approaches will also consider them although lazily paying a 
higher price later on. The results obtained by our prototype with real programs shown 
in Sec.|6]support strongly our view. 



To elaborate even more this point let us consider a real program statemate IfTSl 
used commonly for testing WCET tools. The program is generated automatically and its 
main feature is the huge amount of infeasible paths. We try to build the worst possi- 
ble scenario by instrumenting the program and adding x=0 at the first statement of the 
program where x is a fresh variable, and then adding the condition if (x>0) error() 
at the end. An AR tool should add only the predicate (x = 0) to prove that the pro- 
gram is bug free. However, an actual evaluation using BLAST shows some significant 
performance degradation as it may not always choose the right predicate, resulting in 
21 predicates discovered in 74 seconds on Intel 2.33Ghz 3.2 GB (our algorithm takes 
88 seconds). This experiment exhibits the worst possible scenario for our approach and 
also illustrates another potential limitation of AR. If the abstract error path has more 
than one infeasibility reason, then existing refinement techniques have difficulties in 
choosing the right refinement. Synergy-like tools mitigate this problem but introduce 
other challenges as discussed above. 

4 Formalities 

Here we briefly model a program as a transition system and formalize the proof process 
as one of producing a closed tree of the transition steps. It is convenient to use the 
formal framework of Constraint Logic Programming (CLP) lfT31 . which we outline as 
follows. 

The universe of discourse is a set of terms, integers, and arrays of integers. A con- 
straint is written using a language of functions and relations. 

An atom is of the form pit) where p is a user-defined predicate symbol and the t a 
tuple of terms. A rule is of the form p(k,x) : - p(kf,xf) Ac where the atom p(k,x) is the 
head of the rule, and the atom p(k',x') and the (conjunction of) constraint c (possibly 
relating the variables x and x') constitute the body of the rule. Here both k and k' are 
positive numbers denoting program points or the special constant error to denote an 
error location. We may omit either the atom or the constraint from the body. A goal has 
exactly the same format as a body of a rule. Given a goal Q\ p(k,x) A(j), we denote by 
cons(Q) the constraint (]) or true when (]) is empty. 

Each CLP rule represents a transition in the programQ. For example, given a program 
fragment with two variables x and y, the assignment 5 : x = y+1 6 : is represented as 
the rule /j(5,jc,y) :~p(6,x , ,y')Ay' =yAx l = y + 1 . For a conditional 6: if (x>0) 7:, 
we represent the transition between (6) and (7) by the rule p(6,x,y) : - p(7,x',/) A y' = 
yAx' =xAx > 0. 

A substitution simultaneously replaces each variable in a term or constraint into 
some expression. We specify a substitution by the notation [e/x], where x is a sequence 
xi,... ,x n of variables and e a list e\ , . . . , e n of expressions, such that xi is replaced by e; 
for all 1 <i<n. Given a substitution 8, we write as eQ the application of the substitution 
to an expression e. A renaming is a substitution which maps variables into variables. 
A grounding is a substitution which maps each variable into a value in its domain. A 
ground instance of a constraint, atom and rule is defined in the obvious way. 

1 For lack of space, we refer readers to H171 and its references for more details about the trans- 
lation from transition systems to CLP programs. 



Given a goal Q = p(k,x) A x P(x), \Q\ is the set of the groundings 8 of the primary 
variables x such that 3*P(jc)0 holds. A goal Q = subsumes another goal Q = 

p(kf,xf) A^x 7 ) if k = £' and [[(7]] D [[£^]]. Equivalently, we say that ^ is a generalization 
of £7. We write = Q 2 if and ^ 2 are generalizations of each other. 

We use the notion of reduction to represent symbolic strongest postcondition oper- 
ation. Let a rule R : p(k,x) : - p(k' ,x') A c belong to a CLP program. Given a goal Q : 
p(k,Xj) A *P with variables disjoint from 7?, a reduct or derivation of (7 using /? (denoted 
reduct R (Q)) is the goal p(fe',x i ' + i),1 , Ac[x i 7i][x ; - + i/f']. A derivation sequence (path) is 
a sequence of goals Q\i'" where Q { ,i > is a reduct of Q { _ y 

A goal Q : p(k,x) A c is called terminal if there are no applicable rules to perform 
reduction on it, and it is called looping if it is derived from another goal with the same 
k (called its looping parent) through one or more reduction steps. A goal is infeasible if 
its constraints are unsatisfiable, and a derivation sequence is so called when it ends in 
an infeasible goal. 

5 Algorithm: Minimax 

As mentioned above, there is an obvious strategy for dealing with loops by using iter- 
ative deepening on the level of loop unrolling, and in each iteration, to generate loop 
invariants. In this section, we present an algorithm that performs unrolling in an in- 
telligent manner, using information about why a particular path does not suffice. In 
this regard, there is similarity to CEGAR where, if a candidate loop invariant is found 
insufficient (too weak), the refinement process takes into account the reason for this 
insufficiency in order to arrive at the next refinement. 

Our algorithm maintains knowledge about a state (goal) Q = p(k,x) Ac\ A... Ac„ 
by means of a vector v = (oc , ...,0c") where each a' is an annotation of one of the 
following kinds: 

• a max annotation, indicating that the constraint c,- must be kept 

• a min annotation, indicating that the constraint c, must be deleted, or 

• a neutral annotation. 

Denote the z'-th annotation in v by a' v . Let c be a constraint, its annotation is de- 
noted oc v (c). neut(c) denotes a vector (neutral, ... , neutral) of the same length as c. 
We write conflict(v\ , V2) if 3 1 < / < min{lengtli(v\),length(v2)} such that (aj, ( = min) 
and (a' V2 = max). 

A pair (Q, v) where the state Q = p(k,x) Ac is called an annotated state. The mean- 
ing of an annotated state a = ( Q , v) is obtained in two ways. A max interpretation a max is 
the state obtained by deleting all but the ;7zax-annotated constraints in c. Dually, a min in- 
terpretation o m ,„ is the state obtained by including all but the ;7«'n-annotated constraints 
in c. For example, given an annotated state a: p(5,xi,X2,xj,) Ax\ = 1 Ax2 = 2 Axi = 3, 
(min, neutral, max), <3 max and G m ;„ are, respectively the two states p(5,x\,X2,x^)A 
X2 = 2 A X3 = 3 and p(5,x\,X2,xi) Ax$ = 3. Note cons(o m j„) is weaker than cons(a max ). 

The use of vectors is an efficient way for computing interpolants. max-annotated 
constraints of an annotated state o = (Q,v) must be kept to preserve some infeasible 
paths in the derivation tree emanating from o. Given an infeasible annotated state a' 



derived from a, we minimally max-annotate the constraints in a' (some of which are 
constraints of a since they share the vector) such that the infeasibility is maintained. 
In this way we immediately obtain an abstraction at a (that is, O max ) by the max anno- 
tations generated at a' without performing weakest precondition propagation or some 
approximation of it. a max subsumes Q yet it entails the infeasibility of o' and therefore 
is an interpolant. The final abstraction at o is a conjunction of the interpolants returned 
by the children, and this is easily obtained by the conjunction of all max-annotated 
constraints at o after the subtree is traversed. 

The algorithm operates on annotated states. Its 
depth-first traversal is outlined in Fig. |4] When encoun- 
tering a loop (point L in Fig. a loop invariant is pro- 
duced by weakening the constraints at L by minimally 
m/n-annotating its state. This weakening is then applied 
in the forward execution of the points beyond the L. 
This abstraction, however, is not the final abstraction 
that is used to subsume other states since it still can 
be weakened further as some constraints may not con- 
tribute to the infeasibility or subsumption of descen- 
dant states. The final abstract state (in L or elsewhere) 
is computed by propagating max annotations backward 
in post-order manner. Max annotations are produced by 
interpolation at points where infeasibility and subsump- 
tion are found (Lines|2]|5]and[T0]in Fig.0. 
Before detailing the algorithm of Fig. |5]we first explain its main components. 



X return 
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Fig. 4. Min-Max 



Interpolation. If o is (p(k,x) Ac,v) and cons(Q min ) A<p is unsatisfiable, interpolate^, 
cp) returns an annotation v' which has the same length as v (and c), satisfying the fol- 
lowing: 

1. Vc G c : Otv(c) € {min,max\ => a v /(c) = Ct v (c), 

2. VcGc: OC v (c) = neutral =>■ a v / (c) S {neutral, max}, and 

3. a' = {p{k,x) Ac,v') => (cons(& maK ) A (p unsatisfiable) 

v' is computed by adding the fewest max annotations to neutral annotations in v, thus 
representing a computation of an interpolant: o' max maintains the unsatisfiability (con- 
sequence) of a,„i„ yet it has less constraints (more general). For example, consider the 
annotated state o : [p{k,x\ ,X2) Ac, (min, max, neutral, neutral)) with c be x\ > 3A 
x\ = y\ + 1A y\ =2A X2 = 0, and a constraint (p : x\ < 0. where Here, a m ,„ is unsatis- 
fiable. Then interpolate^, (p) produces the vector v' : (min, max, max, neutral). That 
is, the third constraint's annotation is changed from neutral to max such that o' max A (p 
maintains the unsatisfiability. 

Subsumption and Loop Invariants. An essential feature of our algorithm, existing 
also in AR methods, is the ability of blocking the forward search traversal of an anno- 
tated state o if there exists another state a' already processed such that the state of a 
entails the state associated with a'. During the symbolic traversal there are two kinds of 
subsumptions. 



Parent-Child: assume that a' is a looping ancestor of a. Here o' would be of the 
form (p(k,x') Aci,vi) and o of the form (p(k,x) A c\ A C2, v), where v = vi ■ V2 with 
V2 of the same length as c%. Since we would like to unroll as few times as possible, 
the algorithm forces (if possible) parent-child subsumption by computing the strongest 
path-based loop invariant. Therefore, vi can be replaced with a vector vT of the same 
length where some neutral annotations (those that are not individually invariant) in vi 
are transformed to min annotations in v7 such that a' mjn subsumes a min . The function 
invariant(a,o') returns the vector vf- V2 if subsumption holds. Otherwise, the parent- 
child subsumption is not possible and the algorithm returns _L. This our mechanism to 
lazily unroll loops. 

Sibling-Sibling: assume now the state a' has been already processed and stored in 
a memo table, *M.j. The condition here is that the current state associated to a entails 
the interpolant associate to a'. That is, a' max subsumes C m ,„. This test is done by the 
function subsumed(fW7-,a) in the algorithm. If the test holds, this function also returns 
a subsuming state G s " h . Otherwise, _L 

For a s " b we need to distinguish two subcases. If a' is out of the scope of a loop, 
then a sllb = a'. Otherwise, as in the case of parent-child subsumption, we may need to 
convert some neutral annotations into min annotations to communicate ancestors the 
conditions under the subsumption took place. In particular, those neutral annotations 
which if had been max annotations then subsumption would not have held. 

Merging Vectors. We use two operations for merging both min and max annotations. 
Given two vectors vi and V2'. 

mergemin(vi, V2): if the condition VI < i < min{length(v{),length(v2)} ■ oc^ = min =>■ 
a[, 2 £ {neutral, min} holds then it returns a vector v satisfying VI < i < length(v\) : 
(a' Vi = min => a[, = min)f\ (u' Vl ^ min => a' v = a' ). Otherwise, the function returns _L. 

mergemax(vi, V2): returns always a vector v satisfying VI < i < max {length(v\), length^)} '■ 
= maxV i > lengthiy^)) =>• a|, = OC^A (((X^ ^ max\l i > length(v\)) a' v = 

<)• 

The Minimax routine takes as inputs the depth T) of the symbolic tree, a current 
annotated state a, and the table Ct to record the ancestor states that can potentially 
become the looping parent of the current state. There is a global table, Mj, to store the 
interpolants already computed. The execution starts with some T> = 0, 0„„ r which is 
neutral, and an empty Ct- The memo table, Mj, is also initially empty. 

Line |2] handles the case when the state is infeasible. Here, max annotations are 
created using the procedure interpolate to indicate constraints that are needed in order 
to preserve unsatisfiability of the constraints. 

LinesU-|7]handle the case when error program point is visited with feasible O m ,„. In 
case o is itself feasible (c is satisfiable), we have found a real error, and the algorithm 
aborts (Line|4]i. In case a is infeasible, we have found a spurious state, which is visited 
due to the weakening caused by min annotations. At Line [5] we compute max annota- 
tions such that the infeasibility is preserved. At Line|6]we compute the shallowest depth 
value such that the conflict occurs, from which we are to restart. We then add the com- 
puted max annotations to the input vector and returns the resulting vector together with 
a CONFLICT status and the computed depth (Line|7]i. 



Minimax(£>, a, Or) returns (OK,a,b) or (conflict, a, fo) with vector a and integer b 
let o be (Q,v) and Q be p(k,x) A c 
switch(o) 



1: case cons(a m j„) unsatisfiable: 

2: return (OK,interpolate(o, true),Q) 

3 : case = error. 

4: if (c is satisfiable) abort 

5: v' := interpolate((^,«enf(c)),rrae) 

6: d := min{l\(l,{§' ', v")) 6 6> and conflict (V , v')} 

7: return (C0NFLiCT,mergemax(v', v),d) 

8: case ^ is terminal: return (OK,v,0) 

9: case There is o mb = subsumed {M T ,o) and a s " b ^ _!_ : 
10: return (OK,interpolate(o,^conj(o™^ t )),0) 

11: case 5 = {a'\(l,a') e Cr and o' looping parent of o"} ^ 0: 
12: foreacha'eS 

13: if (V = invariant^, c') and v' ^ _L) return (OK,v',0) 

14: goto default 

default : 

15: v' := v,CJ := (£,v') 

16: foreach in red+ jn {a)... red mjn {a) 

17: let cons(g') be cAc* 

18: (Status, v" ,d) := Minimax(©+ 1, (£', v' • neut(c')), Or U {(©,o)}) 

19: v'"=Hp((^',v"),c') 
20: if (Status = CONFLICT) 

21: if(rf=£>) 

22: SVfr := 5W r \{o'|a' = (£",_) and £" derived from 

23: return Minimax(D, {Q,V"), Or) 

24: else return (conflict, v"',d) 

25: v' := mergemax(v"',mergemin(v"',v')) 

26: 5Wr :=5Wt-U{(^,v')} 

27: return (ok,v',0) 



Fig. 5. The Minimax Algorithm 

Line [8] is selected if the end of the path is reached. Here it is not necessary to add 
either min or max annotations as looping points or infeasible/subsumed states can no 
longer be reached, and we therefore return OK and the input annotation itself. 

Lines |9UT0| handle the case if the current state is subsumed by another state already 
memoed in Mr - Recall the notion of subsuming state a sllb explained previously. Here 
we return OK with a vector with more max annotations than the input vector v needed to 
ensure the entailment (unsatisfiability of the negation of the constraints) of the abstrac- 
tion of the subsuming state (cj"* r ). 

Lines [TTlfPfl handle the case when the current state is looping. Here we attempt to 
compute a path-based loop invariant using min annotations that are produced by calling 
invariant subprocedure (LinefLTt in order to force parent-child subsumption. If invariant 
fails to produce the abstraction, we continue to the default case (LinelT4l. 

The default case at Lines [T5l - [27l performs one symbolic execution step. We first 
formalize functions used in Line [16] Let a be {Q,v), we denote by red^ jn (o) the set 
{Cj'\3R : cons(reduct/((a m i n )) is satisfiable A Q' = reductR(Cj)} . Similarly, we denote 



by red min (a) the set {Q'\3R : cons(reductR(o,„i„)) is unsatisfiable A Q' = reductR(Q)} . 
In essence, red^ lin (&) (red mhl (a)) is the set of reducts of Q such that, using the same 
rules, the reduction of G, m „ is feasible (infeasible). In this way, the loop in Line [16] 
makes us prioritize transitions that are feasible, possibly due to min abstraction. This is 
important to not generate max annotations that restricts abstraction too early resulting 
in failure to discover loop invariants later (inability to convert max to min annotations at 
LinefLTt. Then, the loop iterates in sequence over the reducts, performing recursive calls 
to Minimax (Line[T8l). The result, for each one, is a triple (Status, v", d). v" here contains 
max annotations that specify how the current state need to be abstracted. At Line [19] 
we compute the abstraction for the current state based on the annotation returned using 
the function wp, which denotes an approximation of the weakest precondition. In our 
framework, this can be trivially done, without calling the theorem prover, by cutting off 
the last \c'\ elements of v" . (Here c' are the constraints added by the reduction.) 

If Status is CONFLICT with depth d (produced at Line[7]l, we know that somewhere 
during the recursive call, a conflict occurred. If the depth d is equal to the current depth 
r D, then this is the topmost point where the conflict is originated. In addition, we know 
that v'" (the vector after calling wp) is the same as v, but with some max annotations 
replacing min annotations. In essence, we "lock" such annotations. More importantly, 
this may result in failure to create loop invariant at Line [13] later. At Line [22] we need 
to clean the memo table for those states derived from the current input state. We then 
perform another recursive call (Line |23} as a replacement for the current call (without 
making any transition step), and using v'" to propagate the locked annotations. If the 
current depth is not equal to the conflict depth, we simply propagate the conflict to the 
parent (Linel24ii. 

Finally, Line [25] combines the vectors returned by each descendant, and after all 
vectors are merged, Line |26] stores in the memo table the interpolant for the current 
goal. 

We conclude this section by mentioning that the central step of deleting constraints, 
the effect of a min annotation, can in fact be relaxed to some other mechanism that 
abstracts the state at hand. Instead of deleting a constraint, one could transform a con- 
straint. For example, one could apply a process of "slackening" to equations x = y to 
obtain an inequality, either x < y or x > y. This kind of abstraction is in fact employed 
in the BLAST system which we benchmark against, but at this time, we do not use for 
our own experimental results. Even more generally, we could replace not one but a col- 
lection of constraints by another collection which is entailed by the original collection. 

6 Experimental Evaluation 

We implemented our prototype TRACER modelling the C heap using the theory of ar- 
rays with alias analysis to partition and inlining functions. We ran TRACER on several 
programs instrumented with safety properties and compare with BLAST [60. We down- 
loaded all programs from already instrumented with safety conditions, and together 
with a script which runs those programs with the most favorable system options. 

2 We tried with ARMC but we were only successful to run on teas and statemate but timeout 
expired in both cases after 30m and lh, respectively. 



The results are summarized in Table [6] We present two sets of numbers: for BLAST 
the number of discovered predicates (P) the total time in seconds (T), and for TRACER, 
our prototype tool, the number of nodes of the exploration tree (S) and also the total 
time in seconds (T). Although the number of discovered predicates and nodes of the ex- 
ploration tree are not comparable they are shown to provide an idea about the hardness 
of the proof. 

In summary, TRACER is com- 
petitive with BLAST in most of 
the benchmark examples, sometimes 
much faster. However, there are two 
programs where BLAST is faster 
(tcas-la, and tcas-2a). We believe 
the main reason is that TRACER does 
perform some extra work due to un- 
necessary infeasible paths. Neverthe- 
less, the numbers show that the differ- 
ences are not significant. 

Note that programs such as cdaudio, 
floppy, and serial are annotated 
with the symbol '*' in the BLAST col- 
umn which means that BLAST raised 
an exception and aborted. Therefore, 
we were not able to verify those 
programs using BLAST. Although we 
could not contact BLAST authors we 
are aware that cdaudio and floppy 
have been proved safe in lfl3l after 
21m59s and llml7s discovering 196 
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Fig. 6. BLAST Benchmarks on Intel 2.33Ghz 
3.2GB 



and 156 predicates, respectively on Pentium 2.4Ghz 512Mb. 

Special mention deserves the cases where the programs were proved unsafe. In these 
cases, TRACER found a real counterexample much faster than BLAST. The reason is that 
TRACER blocks infeasible paths and then finds very quick the real error. BLAST will 
spent some time performing refinements and traversing space which are irrelevant to 
the real error path. Nevertheless, this is an example where we believe that Synergy-like 
tools using test cases would perform as ours since DART could also find the real error 
path faster. 



7 Concluding Remarks 

We extended Abstraction Learning, an interpolation-based symbolic execution method, 
to automatically handle unbounded loops. The algorithm is an intelligent unrolling pro- 
cess by classifying into min and max constraints. The min constraints are those which 
must be abstracted in order to achieve subsumption and loop invariance, while the max 
constraints are those which must not be abstracted so as to detect infeasible paths and 
also to preserve safety. The idea is to have as few of these two kinds of constraints as 



possible. We discussed the relative merits of ours and AR-based methods using aca- 
demic examples. We also evaluated our prototype, TRACER, against BLAST, the most 
advanced system available to us, using real programs. The results show competitive 
performance, with some examples showing significant improvement. In all cases, the 
results show that eagerly detecting infeasible paths can be efficient. 
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